80 research outputs found

    Online Mutual Foreground Segmentation for Multispectral Stereo Videos

    Full text link
    The segmentation of video sequences into foreground and background regions is a low-level process commonly used in video content analysis and smart surveillance applications. Using a multispectral camera setup can improve this process by providing more diverse data to help identify objects despite adverse imaging conditions. The registration of several data sources is however not trivial if the appearance of objects produced by each sensor differs substantially. This problem is further complicated when parallax effects cannot be ignored when using close-range stereo pairs. In this work, we present a new method to simultaneously tackle multispectral segmentation and stereo registration. Using an iterative procedure, we estimate the labeling result for one problem using the provisional result of the other. Our approach is based on the alternating minimization of two energy functions that are linked through the use of dynamic priors. We rely on the integration of shape and appearance cues to find proper multispectral correspondences, and to properly segment objects in low contrast regions. We also formulate our model as a frame processing pipeline using higher order terms to improve the temporal coherence of our results. Our method is evaluated under different configurations on multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018

    β\beta-Multivariational Autoencoder for Entangled Representation Learning in Video Frames

    Full text link
    It is crucial to choose actions from an appropriate distribution while learning a sequential decision-making process in which a set of actions is expected given the states and previous reward. Yet, if there are more than two latent variables and every two variables have a covariance value, learning a known prior from data becomes challenging. Because when the data are big and diverse, many posterior estimate methods experience posterior collapse. In this paper, we propose the β\beta-Multivariational Autoencoder (β\betaMVAE) to learn a Multivariate Gaussian prior from video frames for use as part of a single object-tracking in form of a decision-making process. We present a novel formulation for object motion in videos with a set of dependent parameters to address a single object-tracking task. The true values of the motion parameters are obtained through data analysis on the training set. The parameters population is then assumed to have a Multivariate Gaussian distribution. The β\betaMVAE is developed to learn this entangled prior p=N(μ,Σ)p = N(\mu, \Sigma) directly from frame patches where the output is the object masks of the frame patches. We devise a bottleneck to estimate the posterior's parameters, i.e. μ,Σ\mu', \Sigma'. Via a new reparameterization trick, we learn the likelihood p(x^z)p(\hat{x}|z) as the object mask of the input. Furthermore, we alter the neural network of β\betaMVAE with the U-Net architecture and name the new network β\betaMultivariational U-Net (β\betaMVUnet). Our networks are trained from scratch via over 85k video frames for 24 (β\betaMVUnet) and 78 (β\betaMVAE) million steps. We show that β\betaMVUnet enhances both posterior estimation and segmentation functioning over the test set. Our code and the trained networks are publicly released

    Future Video Prediction from a Single Frame for Video Anomaly Detection

    Full text link
    Video anomaly detection (VAD) is an important but challenging task in computer vision. The main challenge rises due to the rarity of training samples to model all anomaly cases. Hence, semi-supervised anomaly detection methods have gotten more attention, since they focus on modeling normals and they detect anomalies by measuring the deviations from normal patterns. Despite impressive advances of these methods in modeling normal motion and appearance, long-term motion modeling has not been effectively explored so far. Inspired by the abilities of the future frame prediction proxy-task, we introduce the task of future video prediction from a single frame, as a novel proxy-task for video anomaly detection. This proxy-task alleviates the challenges of previous methods in learning longer motion patterns. Moreover, we replace the initial and future raw frames with their corresponding semantic segmentation map, which not only makes the method aware of object class but also makes the prediction task less complex for the model. Extensive experiments on the benchmark datasets (ShanghaiTech, UCSD-Ped1, and UCSD-Ped2) show the effectiveness of the method and the superiority of its performance compared to SOTA prediction-based VAD methods

    Reproducible Evaluation of Pan-Tilt-Zoom Tracking

    Get PDF
    Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. However, it is very difficult to assess the progress that has been made on this topic because there is no standard evaluation methodology. The difficulty in evaluating PTZ tracking algorithms arises from their dynamic nature. In contrast to other forms of tracking, PTZ tracking involves both locating the target in the image and controlling the motors of the camera to aim it so that the target stays in its field of view. This type of tracking can only be performed online. In this paper, we propose a new evaluation framework based on a virtual PTZ camera. With this framework, tracking scenarios do not change for each experiment and we are able to replicate online PTZ camera control and behavior including camera positioning delays, tracker processing delays, and numerical zoom. We tested our evaluation framework with the Camshift tracker to show its viability and to establish baseline results.Comment: This is an extended version of the 2015 ICIP paper "Reproducible Evaluation of Pan-Tilt-Zoom Tracking

    The Long-Baseline Neutrino Experiment: Exploring Fundamental Symmetries of the Universe

    Get PDF
    The preponderance of matter over antimatter in the early Universe, the dynamics of the supernova bursts that produced the heavy elements necessary for life and whether protons eventually decay --- these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our Universe, its current state and its eventual fate. The Long-Baseline Neutrino Experiment (LBNE) represents an extensively developed plan for a world-class experiment dedicated to addressing these questions. LBNE is conceived around three central components: (1) a new, high-intensity neutrino source generated from a megawatt-class proton accelerator at Fermi National Accelerator Laboratory, (2) a near neutrino detector just downstream of the source, and (3) a massive liquid argon time-projection chamber deployed as a far detector deep underground at the Sanford Underground Research Facility. This facility, located at the site of the former Homestake Mine in Lead, South Dakota, is approximately 1,300 km from the neutrino source at Fermilab -- a distance (baseline) that delivers optimal sensitivity to neutrino charge-parity symmetry violation and mass ordering effects. This ambitious yet cost-effective design incorporates scalability and flexibility and can accommodate a variety of upgrades and contributions. With its exceptional combination of experimental configuration, technical capabilities, and potential for transformative discoveries, LBNE promises to be a vital facility for the field of particle physics worldwide, providing physicists from around the globe with opportunities to collaborate in a twenty to thirty year program of exciting science. In this document we provide a comprehensive overview of LBNE's scientific objectives, its place in the landscape of neutrino physics worldwide, the technologies it will incorporate and the capabilities it will possess.Comment: Major update of previous version. This is the reference document for LBNE science program and current status. Chapters 1, 3, and 9 provide a comprehensive overview of LBNE's scientific objectives, its place in the landscape of neutrino physics worldwide, the technologies it will incorporate and the capabilities it will possess. 288 pages, 116 figure

    Multi-Task Learning based Video Anomaly Detection with Attention

    Full text link
    Multi-task learning based video anomaly detection methods combine multiple proxy tasks in different branches to detect video anomalies in different situations. Most existing methods either do not combine complementary tasks to effectively cover all motion patterns, or the class of the objects is not explicitly considered. To address the aforementioned shortcomings, we propose a novel multi-task learning based method that combines complementary proxy tasks to better consider the motion and appearance features. We combine the semantic segmentation and future frame prediction tasks in a single branch to learn the object class and consistent motion patterns, and to detect respective anomalies simultaneously. In the second branch, we added several attention mechanisms to detect motion anomalies with attention to object parts, the direction of motion, and the distance of the objects from the camera. Our qualitative results show that the proposed method considers the object class effectively and learns motion with attention to the aforementioned important factors which results in a precise motion modeling and a better motion anomaly detection. Additionally, quantitative results show the superiority of our method compared with state-of-the-art methods

    Extraction of volumetric structures in an illuminance image

    Get PDF
    An original method is proposed to extract the most significant volumetric structures in an illuminance im age. The method proceeds in three levels of organization managed by generic grouping principles: (i) from the illuminance im age to amore compact representation of its contents by generic structural in formation extraction leading to a basic contour primitive map; (ii) grouping of the basic primitives in order to form intermediate primitives, the contour junctions; (iii) grouping of these junctions in order to build the high-level contour primitives, the generic volumetric structures. Experimental results for various images of clutte red scenes show an ability to properly extract the structures of volumetric objects or parts with planar and curved surfaces

    Generic Multi-scale Segmentation and Curve Approximation Method

    No full text
    We propose a new complete method to extract si gni ficant descri tii (s) of planar curves accordi g to constant curvature segments. Th i methodi s based (i) on a multi -scale segmentatiR and curve approxiM = iM algori hm, defined by two groupi ng processes (polygonal and constant curvature approxi mati ons), leadi ng to a multi -scale coveri ng of the curve, and (ii) on an i traandi nter-scaleclassi cat it of th i mult i scale coveri g gui ded by heurik ik]R-defined quali at i e labels leadi g to paiM (scale,lia of constant curvature segments) that best descri e the shape of the curve. Experi ments show that the proposed methodi s able to prov i esali] t segmentatik and approxiR] iR results whi ch respect shape descri tir and recogn iogn crin1RMM 1 Int roduct In ordert easily manipulat a planar curve or da t bases composed of planar curves,it would be int1est#/ t represent dat a accordingt primit/ es which describet hem in a waytG t resp ecttt: act: l shape for recognit/ n and co..
    corecore